Apparatus and method for supporting document data search

ABSTRACT

In a search support server, a related word extraction unit generates frequency information and co-occurrence information of keywords, a graph generation unit generates coordinate information of a spring graph including the keywords as nodes, on the basis of the co-occurrence information, a cluster generation unit groups the nodes into clusters and thereby generates cluster definition information, and a display information generation unit generates display information of the spring graph. In addition, an operation determination unit determines which operation is performed on the spring graph. Then, when a level change is instructed, the display information generation unit generates display information of the spring graph after the level is changed. When a node change is instructed, a cluster re-generation unit changes the cluster definition information and the frequency information. When a search query generation is instructed, a search query generation unit generates a search query with a keyword of a selected cluster.

BACKGROUND OF THE INVENTION

The present invention relates to an apparatus and a method forsupporting a document data search. In particular, the present inventionrelates to an apparatus and a method for supporting a document datasearch based on a keyword.

Use of a search engine has become the mainstream of search methods forsearching the Internet for desired information, for example. As theWorld Wide Web (WWW) grows larger and larger, the number of web pagesdisplayed as the search results becomes larger as well. Accordingly, itis not always true that the information actually needed by the user isdisplayed at the top of the search results.

Users need to consider a search word from various perspectives as aninput to the search engine. For example, users need to consider whetheror not a particular search word is actually the appropriate one, whatsearch word is appropriate to narrow down the search results into onesthat include only desired information, approximately how many searchwords are appropriate, or the like. As described above, it is difficultfor average users to determine what search word is appropriate as theinput to a search engine.

In this respect, a technique for automatically generating a search queryincluding a combination of search words has been proposed heretofore(refer to Koji Eguchi and two others, “Incremental Query ExpansionConsidering Adaptation to User's Behavior Based on Clustering the SearchResults,” Research Report of Information Processing Society of Japan,98-DBS-114, Vol. 98, No. 2, Information Processing Society of Japan, pp.43-48, Jan. 19, 1998 (hereinafter, referred to as Non-Patent Document1), and Hideki Mima and another, “Development of Web Information SearchSupport System for Children,” Feb. 14, 2004, INFORMATION-TECHNOLOGYPROMOTION AGENCY, JAPAN (on line), Internet URL:http://www.ipa.go.jp/SPC/report/03fy-pro/mito/15-1198d.pdf (search date,Apr. 10, 2008) (hereinafter, referred to as Non-Patent Document 2), forexample). According to the technique disclosed in Non-Patent Document 1,a huge amount of search results are grouped into clusters, and then, thequery is gradually modified each time a user confirms a cluster as anappropriate one. In the technique disclosed in Non-Patent Document 2,search path information is generated by use of ontology information orthe like and then presented to the user. The query is then expanded whenthe user selects a search path.

In addition, the following shows other techniques for supporting asearch. For example, disclosed is a technique for searching multipleinformation sources (refer to Japanese Patent Application PublicationNo. 2000-250935 (hereinafter referred to as Patent Document 1), forexample). In this technique, locations of information and classificationinformation are managed by use of a classification tree, and then,information including a search item and a search condition for findingthe structure of the classification tree and an information source aremanaged as a class definition. Then, a search is performed on themultiple information sources by tracking back entry information storedin the classification tree of the information. In addition, disclosed isa technique for automatically displaying information indicating what hitrate can be obtained as a result of search on a set of original searchresults by using a new search condition, or the like (refer to KazuhiroHayakawa and two others, “A Visual User Interface for Refining SearchEngine Results,” Research Report of Information Processing Society ofJapan, 98-HI-76, 98-IM-33, Vol. 98, No. 9, Information ProcessingSociety of Japan, pp. 25-30, Jan. 29, 1998 (hereinafter, referred to asNon-Patent Document 3), for example). Moreover, disclosed is a techniquefor visually arranging and displaying web contents in search resultswith their web-specific link relationships by use of characteristicinformation given to the search results (refer to Kobayasi Aki, “UserInterface for Search in Web Link Space,” 2003, TelecommunicationsAdvancement Foundation (on line), Internet URL:http://www.taf.or.jp/publication/kjosei_(—)20/pdf/p453.pdf (search date,Apr. 1, 2008) (hereinafter, referred to as Non-Patent Document 4), forexample). In addition, there is another technique for displaying aspring graph including search word candidates as nodes and co-occurrencerelationships between the search word candidates as the edges (refer toNagahata Hiroomi and another, “Query Suggestion Using PluralVisualization Techniques,” Mar. 10, 2008, Institute of Electronics,Information and Communication Engineers (on line), Internet URL:http://www.ieice.org/˜de/DEWS/DEWS2008/proceedings/files/b5/b5-5.pdf(search date Apr. 1, 2008) (hereinafter, referred to as Non-PatentDocument 5), for example).

As described above, various search support techniques have been proposedheretofore.

However, the technique in Non-Patent Document 1 merely corrects a queryaccording to an evaluation performed by the user on appropriateness of adocument cluster, but does not correct a query by allowing a user tospecify an appropriate keyword for a search in accordance with a userrequest. In addition, the technique disclosed in Non-Patent Document 2performs query expansion on the basis of a search path generated usingontology information or the like, but does not perform query expansionon the basis of a search path including a keyword appropriate for asearch in accordance with a user request. Accordingly, any one of thetechniques does not resolve a problem that an appropriate keyword inaccordance with a user request cannot be specified to perform a search.

Likewise, the techniques disclosed in Patent Document 1, and Non-PatentDocuments 3 to 5 do not provide means for performing a search inaccordance with a user request by allowing a user to specify a keywordappropriate for the search.

SUMMARY OF THE INVENTION

An object of the present invention is to perform a search in accordancewith a user request by allowing a user to specify a keyword appropriatefor the search through a simple operation.

In order to achieve the aforementioned objective, the present inventionprovides an apparatus for supporting a document data search based on akeyword. The apparatus includes: an extraction unit for extracting aplurality of keywords from search target document data; a graphgeneration unit for generating a graph including a plurality of objectsrespectively representing the plurality of keywords extracted by theextraction unit and being classified into a plurality of multipleclusters; and a search condition statement generation unit forgenerating a search condition statement by use of a keyword, inaccordance with a user operation to select a specific cluster among theplurality of clusters in the graph generated by the graph generationunit, the keyword being represented by an object that belongs to thespecific cluster.

In this apparatus, the graph generation unit may generate the graph inwhich the plurality of objects are arranged respectively at suchpositions that a distance between two objects of the plurality ofobjects corresponds to a degree of co-occurrence of two keywordsrepresented by the two objects, and in which the plurality of objectsare classified into the multiple clusters on the basis of information onthe positions.

Furthermore, in this apparatus, among keywords that appear in the searchtarget document data, the extraction unit may extract keywords thatappear at a specific level of frequency, as the plurality of keywordscorresponding to the specific level, and in accordance with a useroperation to specify the specific level, the graph generation unit maygenerate the graph including a plurality of objects respectivelyrepresenting the plurality of keywords corresponding to the specificlevel and being classified into a plurality of clusters.

Moreover, in this apparatus, in accordance with a user operation on thespecific object included in the graph, the graph generation unit may adda change related to the specific object in the graph. Here, the changerelated to the specific object may be a change to cause the specificobject belonging to a first cluster to belong to a second cluster; achange to merge the specific object and an object different from thespecific object and thereby to obtain a single object representing akeyword represented by the specific object and a keyword represented bythe different object; or a change to delete the specific object.

In addition, in this apparatus, the search condition statementgeneration unit may generate the search condition statement includingthe keyword represented by the object belonging to the specific clusteras any one condition of an AND condition and an OR condition, which isdetermined previously in accordance with appearance frequency of thekeyword. Furthermore, in the apparatus, the search condition statementgeneration unit may generate the search condition statement including akeyword represented by an object that belongs to a cluster other thanthe specific cluster, as a NOT condition.

In addition, the present invention provides an apparatus for supportinga document data search based on a keyword. The apparatus includes: anextraction unit for extracting a plurality of keywords from searchtarget document data; a graph generation unit for generating a graphincluding a plurality of objects respectively representing the pluralityof keywords extracted by the extraction unit, the plurality of objectsbeing arranged respectively at such positions that a distance betweentwo objects of the plurality of objects corresponds to a degree ofco-occurrence of two keywords represented by the two objects, and theplurality of objects being classified into a plurality of clusters onthe basis of information on the positions; and a search conditionstatement generation unit for generating a search condition statement inaccordance with a user operation to select a specific cluster among theplurality of clusters in the graph generated by the graph generationunit, the search condition statement including a keyword represented byan object that belongs to the specific cluster, as any one condition ofan AND condition and an OR condition, and also a keyword represented byan object that belongs to a cluster other than the specific cluster, asa NOT condition.

Moreover, the present invention provides a method of supporting adocument data search based on a keyword. The method includes the stepsof: extracting a plurality of keywords from search target document data;generating a graph including a plurality of objects respectivelyrepresenting the extracted plurality of keywords and being classifiedinto a plurality of clusters; and in accordance with a user operation toselect a specific cluster among the plurality of clusters in thegenerated graph, generating a search condition statement by use of akeyword represented by an object that belongs to the specific cluster.

Furthermore, the present invention provides a method of supporting adocument data search based on a keyword. The method includes the stepsof: extracting a plurality of keywords from search target document data;generating a graph including a plurality of objects respectivelyrepresenting the extracted plurality of keywords and being classifiedinto a plurality of clusters; in accordance with a user operation for aspecific object included in the generated graph, adding a change relatedto the specific object to the graph; and in accordance with a useroperation to select a specific cluster among the plurality of clustersin the graph after the change is added thereto, generating a searchcondition statement by use of a keyword represented by an object thatbelongs to the specific cluster.

In addition, the present invention provides a program causing a computerto function as an apparatus for supporting a document data search basedon a keyword, the program causing the computer to function as: anextraction unit for extracting a plurality of keywords from a searchtarget document data; a graph generation unit for generating a graphincluding a plurality of objects respectively representing the pluralityof keywords extracted by the extraction unit and being classified into aplurality of; and a search condition statement generation unit forgenerating, in accordance with a user operation to select a specificcluster among the plurality of clusters in the graph generated by thegraph generation unit, a search condition statement by use of a keywordrepresented by an object of the specific cluster.

BRIEF DESCRIPTION OF THE DRAWINGS

For more complete understanding of the present invention and advantagethereof, reference is now made to the following description taken inconjunction with the accompanying drawings.

FIG. 1 is a diagram showing an overview configuration of a computersystem according to an embodiment of the present invention.

FIG. 2 is a diagram showing a functional configuration diagram of asearch support server according to the embodiment of the presentinvention.

FIG. 3 is a flowchart showing an operation example of the search supportserver according to the embodiment of the present invention.

FIG. 4 is a specific example of frequency information generatedaccording to the embodiment of the present invention.

FIG. 5 is a specific example of a co-occurrence matrix generatedaccording to the embodiment of the present invention.

FIG. 6 is a specific example of a spring graph generated according tothe embodiment of the present invention.

FIG. 7 is a specific example of cluster definition information generatedaccording to the embodiment of the present invention.

FIG. 8 is a diagram showing a specific example of a spring graphgenerated according to the embodiment of the present invention whengraphics indicating clusters are added thereto.

FIG. 9 is a flowchart showing an operation example of the search supportserver according to the embodiment of the present invention.

FIGS. 10A and 10B are diagrams showing a specific example of a change inthe spring graph when a hierarchy level change instruction is issued inthe embodiment of the present invention.

FIG. 11 is a diagram showing a specific example of a change in thespring graph when an instruction to move a node between clusters isissued in the embodiment of the present invention.

FIG. 12 is a diagram showing a specific example of cluster definitioninformation after the instruction to move the node between the clustersis issued in the embodiment of the present invention.

FIG. 13 is a diagram showing a hardware configuration of a computer towhich the embodiment of the present invention can be applied.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

A detailed description will be hereinafter given of the best mode forcarrying out the invention (hereinafter referred to as an “embodiment”)with reference to the accompanying drawings.

Firstly, a description will be given of a computer system to which thepresent embodiment is applied.

FIG. 1 is a diagram showing an overall configuration example of such acomputer system.

As illustrated, this computer system includes a client 10, a searchserver 20 and a search support server 30, all of which are connectedwith one another via a network 80.

The client 10 is a terminal device such as a PC used by a user. To bemore specific, the client 10 is a terminal device used by a user forinputting a keyword to the search server 20 and then for obtaining asearch result from the network 80, or for performing a search again witha search condition changed by use of a spring graph transmitted from thesearch support server 30. Note that, although only one client is shownin FIG. 1, two or more clients may be used.

The search server 20 is a so-called search engine. The search server 20is a server computer that transmits a web page to a request source as asearch result upon receipt of a search request with a search condition,the web page being an example of document data that matches the searchcondition. To be more specific, upon receipt of a search request with asearch word from the client 10, the search server 20 transmits a webpage related to the search word to the client 10. In addition, uponreceipt of a search request with a search condition from the searchsupport server 30, the search server 20 transmits a web page thatmatches the search condition to the search support server 30.

The search support server 30 is a server computer for supporting asearch to be performed by the search server 20. To be more specific, thesearch support server 30 receives a search result from the search server20 and transmits to the client 10 a spring graph in which keywordsincluding a search word and related words related to the search word aregrouped into clusters. Then, when a cluster selected by the user on thespring graph is notified to the search support server 30 by the client10, the search support server 30 generates a search query, which is anexample of a search condition statement, by use of a keyword in thecluster, and then transmits the search query to the search server 20.

It should be noted that although the search server 20 and the searchsupport server 30 are described as different server computers herein,these servers can be provided in a single server computer.

The network 80 is communication means used for transmission andreception of information. As the network 80, the Internet and a localarea network (LAN) are exemplified.

A user enters search words (Word1, Word2) into the search server 20 byuse of the client 10 as usual in a computer system having theaforementioned configuration (A). In response to this, a search resultis not only returned to the client 10, but also is passed to the searchsupport server 30 (B). Then, the search support server 30 returns aspring graph to the client 10 as a user interface (C), the spring graphincluding related words (Word3, Word4, Word5) acting as nodes obtainablefrom the search result in addition to the search words. The user isallowed to freely make a change in this spring graph and also toabstractly select an area that the user would like to search. The searchserver 30 thereby automatically generates a search query and inputs thesearch query into the search server 20. Specifically, in thisembodiment, the search support server 30 displays a relevancy betweenkeywords as a spring graph, and according to an operation performed bythe user on the spring graph, the search support server 30 automaticallygenerates a new search query.

In this respect, a description will be first given of a functionalconfiguration of the search support server 30 according to the presentembodiment.

FIG. 2 is a block diagram showing a functional configuration example ofthe search support server 30 according to the present embodiment.

As illustrated, the search support server 30 includes a receiver 31, arelated word extraction unit 32, a frequency information storage unit33, a co-occurrence information storage unit 34, a graph generation unit35, a coordinate information storage unit 36, a cluster generation unit37, a cluster definition information storage unit 38, a displayinformation generation unit 39, and a transmitter 41. In addition, thesearch support server 30 includes an operation determination unit 42, acluster re-generation unit 43 and a search query generation unit 44.

The receiver 31 receives a search result from the search server 20 andalso receives information related to an operation performed on a springgraph (hereinafter, referred to as “operation information”) from theclient 10.

The related word extraction unit 32 performs a morphological analysis ona web page of the search result, then generates a related word group byuse of a frequency analysis, term frequency-inverse document frequency(TF-IDF), or the like and generates frequency information related toappearance frequency of these related words. In addition, the relatedword extraction unit 32 finds a degree of co-occurrence between keywordsincluding search words and related words and generates co-occurrenceinformation formed of a co-occurrence matrix including a degree ofco-occurrence stored in each cell of the matrix. Here, the degree ofco-occurrence can be found by use of a known calculation method. Forexample, the degree of co-occurrence C (k1, k2) of a keyword k1 and akeyword k2 can be found by the following expression: “C(k1, k2)=P(k1,k2)/Pa.” Here, “P” (k1, k2) represents the number of pages includingboth of the keyword k1 and the keyword k2. In the meantime, “Pa”represents the total number of pages. In this embodiment, the relatedword extraction unit 32 is provided as an example of an extraction unitthat extracts multiple keywords from document data.

The frequency information storage unit 33 stores frequency informationgenerated by the related word extraction unit 32.

The co-occurrence information storage unit 34 stores co-occurrenceinformation generated by the related word extraction unit 32.

The graph generation unit 35 generates a spring graph by use of degreesof co-occurrence between keywords. In this spring graph, a node, whichis an example of an object, corresponds to a keyword. In addition, anedge is placed between two nodes corresponding to two keywords having adegree of co-occurrence exceeding a predetermined threshold value. Atthis time, the distance between the two nodes is set so as to beinversely proportional to the degree of co-occurrence of the twokeywords corresponding to these nodes. Moreover, the spring graph has ahierarchy structure. In the spring graph, a node corresponding to akeyword of high appearance frequency is set to be an upper level node.In addition, a node corresponding to a keyword being closely related tothe upper level node and being of relatively low appearance frequency ascompared with that of the upper level node is set to be a lower levelnode. In this embodiment, however, the graph generation unit 35 does notgenerate a spring graph itself, but generates coordinate informationused for generating a spring graph. To be more specific, the graphgeneration unit 35 generates coordinate information indicating aposition of a node included in the spring graph (hereinafter, referredto as “node coordinate information”) and also coordinate informationindicating a position of an edge (hereinafter, referred to as “edgecoordinate information”). It should be noted that characteristics of akeyword that can be obtained from a corpus or the like may be used forgenerating a spring graph.

The coordinate information storage unit 36 stores the node coordinateinformation and the edge coordinate information generated by the graphgeneration unit 35, and cluster coordinate information to be describedlater.

The cluster generation unit 37 groups a set of nodes existing within apredetermined distance as a cluster and generates cluster definitioninformation associating keywords with a cluster. As a clusteringtechnique, a k-means method or the like may be used. In addition, thecluster generation unit 37 generates coordinate information indicating aposition of a cluster on a spring graph (hereinafter, referred to as“cluster coordinate information”). It should be noted that a user canfreely modify the range, the depth or the like of a cluster.

The cluster definition information storage unit 38 stores clusterdefinition information generated by the cluster generation unit 37.

The display information generation unit 39 generates display information(contents of a web page, for example) for displaying a spring graph onthe client 10, on the basis of the node coordinate information, the edgecoordinate information and the cluster coordinate information, which arestored in the coordinate information storage unit 36.

The transmitter 41 transmits the display information generated by thedisplay information generation unit 39 to the client 10 and alsotransmits a search query generated by the search query generation unit44 to the search server 20.

The operation determination unit 42 determines, on the basis ofoperation information received by the receiver 31 from the client 10,the content of an instruction issued by a user. Here, the content of aninstruction to be issued by a user is as follows. First one is aninstruction to change the hierarchy level of the cluster being displayedto a deeper hierarchy level of a cluster using a more detailed keywordwith a certain keyword as the base (hereinafter, referred to as a“hierarchy level change instruction”). Second one is an instruction togenerate a new cluster by operating the generated spring graph andadding thereto a change related to a node (hereinafter, referred to as a“node change instruction”). Third one is an instruction to select onecluster and automatically generate a search query (hereinafter, referredto as a “search query generation instruction”).

The cluster re-generation unit 43 determines the content of a nodechange in a case where the operation determination unit 42 determinesthat a node change instruction has been issued. Then, the clusterre-generation unit 43 generates a new cluster in accordance with thecontent of the change. Here, the content of a node change includeschanging of a cluster which a certain node belongs to, merging of twonodes, deletion of a certain node and the like.

It is to be noted that in this embodiment, the graph generation unit 35,the cluster generation unit 37 and the cluster re-generation unit 43 areprovided as an example of a graph generation unit that generates a graphincluding multiple objects classified into multiple clusters, and thatadds a change related to a specific object to the graph.

In a case where the operation determination unit 42 determines that asearch query generation instruction has been issued, the search querygeneration unit 44 automatically generates a search query from selectedcluster, so that the following conditions are satisfied, for example.The first condition is to include, as an AND condition, a keywordequivalent to an upper level node among keywords belonging to theselected cluster in the search query. The second condition is to includein the search query, as an AND condition, a condition including keywordsequivalent to a lower level node among the keywords belonging to theselected cluster, the keywords being connected by OR. The thirdcondition is to include in the search condition, as a NOT condition, akeyword having an extremely low degree of relevancy with a keywordincluded in the selected cluster. Here, the number of keywords to beincluded as an AND condition depends on the number of keywords in theselected cluster. If a certain web page does not include a keyword to beincluded as an AND condition, the web page is not found during thesearch. Accordingly, the number of keywords to be combined as an ANDcondition is kept at the minimum as a policy, and the number thereof isto be determined by the user. Moreover, the number of keywords to becombined as a NOT condition is not limited since the keywords are usefulin excluding a web page not related to the search.

Next, a description will be given in detail of operations of the searchsupport server 30 in the present embodiment. Here, as the operations ofthe search support server 30, there are an operation for transmitting aspring graph to the client 10 after receiving a search result from thesearch server 20 and an operation for transmitting a search query to thesearch server 20 after receiving operation information on the springgraph from the client 10. In this respect, these two operations areseparately and sequentially described hereinafter.

First, a description will be given of the operation for transmitting aspring graph to the client 10 after receiving a search result from thesearch server 20.

FIG. 3 shows a flowchart of an operation example of the search supportserver 30 in the aforementioned case.

In the search support server 30, the receiver 31 first receives a webpage of a search result from the search server 20 and passes the webpage to the related word extraction unit 32 (step 301).

Then, the related word extraction unit 32 extracts keywords from the webpage passed from the receiver 31, then generates frequency informationincluding appearance frequency of the keywords and frequency levelsdetermined by the appearance frequency, and then stores the frequencyinformation in the frequency information storage unit 33 (step 302).Here, the extraction of keywords from the web page may be performed byuse of a known morphological analysis method. The keywords to beextracted here include a search word initially inputted by the user toobtain the search result and a related word related to the search word.In addition, the related word extraction unit 32 calculates a degree ofco-occurrence between the keywords by use of a known method, thengenerates co-occurrence information including a co-occurrence matrixformed of the degrees of co-occurrence stored in a matrix format, forexample, and then stores the co-occurrence information in theco-occurrence information storage unit 34 (step 303). Here, theco-occurrence matrix is preferably generated for each of the frequencylevels found in step 302. It should be noted that specific examples ofthe frequency information and the co-occurrence information will bedescribed later.

Next, the graph generation unit 35 generates coordinate informationrequired for drawing a spring graph, by use of the co-occurrenceinformation stored in the co-occurrence information storage unit 34, andthen stores the generated coordinate information in the coordinateinformation storage unit 36 (step 304). Here, the spring graph is agraph representing the keywords by nodes and the degrees ofco-occurrence between the keywords by edges between the nodes.Accordingly, the graph generation unit 35 generates the node coordinateinformation indicating a drawing position of a node and the edgecoordinate information indicating a drawing position of an edge.Furthermore, it is preferable that the spring graph be generated foreach hierarchy level in advance, and in a case where a display requestfor a spring graph of a deeper hierarchy level is issued, a spring graphof the corresponding hierarchy level be promptly displayed. The springgraph for each hierarchy level in the above described manner can begenerated by use of a co-occurrence matrix for each of the frequencylevels stored in the co-occurrence information storage unit 34.Incidentally, when generating a spring graph, the graph generation unit35 places a hidden link indicating relevancy between keywordsrepresented by nodes for all of the nodes. This is because the hiddenlink is implicitly used for specifying a keyword to be included as a NOTcondition when a search query is generated, thereafter. In addition, aspecific example of a spring graph will be described later.

Thereafter, the cluster generation unit 37 groups the nodes intoclusters by a known method (a k-means method, for example) by use of thenode coordinate information stored in the coordinate information storageunit 36. The cluster generation unit 37 then generates clusterdefinition information associating the keywords corresponding to thenodes with the clusters into which the nodes are classified, and thenstores the cluster definition information in the cluster definitioninformation storage unit 38 (step 305). It should be noted that aspecific example of the cluster definition information will be describedlater. Moreover, a drawing position of a graphic is determined on thegraph generated by use of the node coordinate information and the edgecoordinate information stored in the coordinate information storage unit36 so that a node can be included in a graphic indicating thecorresponding cluster. Then, the cluster coordinate informationindicating this drawing position is added in the coordinate informationstorage unit 36 (step 306).

When the node coordinate information, the edge coordinate informationand the cluster coordinate information are stored in the coordinateinformation storage unit 36, the display information generation unit 39generates display information for displaying an image in which a graphicindicating a cluster is superimposed on the spring graph, by use ofthese pieces of coordinate information and passes the image to thetransmitter 41 (step 307). It should be noted that since this case isfor displaying the initial spring graph, display information fordisplaying a level 1 spring graph is generated by use of the coordinateinformation corresponding to the uppermost hierarchy level (level 1).

Next, the transmitter 41 transmits the display information fordisplaying the level 1 spring graph, which is passed from the displayinformation generation unit 39, to the client 10 (step 308). The springgraph is thereby displayed on an unillustrated display of the client 10.

Next, the aforementioned operation will be described using a specificexample. It is to be noted that “Java” is a trademark or a registeredtrade mark of Sun Microsystems, Inc., in the United States or othercountries, and “Windows” is a registered trade mark of MicrosoftCorporation in the United States or other countries, hereinafter.

First, the frequency information to be generated in step 302 will bedescribed.

FIG. 4 is a diagram showing a specific example of the frequencyinformation.

As illustrated, the frequency information is information associatingkeywords, appearance frequency and frequency levels with each other.Here, the appearance frequency may be determined by a known method, butthe maximum value of each appearance frequency is regulated to be 100.Each of the frequency levels is a level determined in accordance withcorresponding appearance frequency among multiple levels prepared inadvance. For example, suppose that the following rule is determined inadvance. When appearance frequency is 70 or greater but not more than100, the appearance frequency is classified as a frequency level 1.Moreover, when appearance frequency is 40 or greater but less than 70,the appearance frequency is classified as a frequency level 2.

Furthermore, the co-occurrence information to be generated in step 303will be described.

FIG. 5 is a diagram showing a specific example of a co-occurrence matrixincluded in the co-occurrence information.

As illustrated, in the co-occurrence matrix, a keyword is set in eachcell of the first row and each cell of the first column. Then, in a cellon which a column including a certain cell in the first row and a rowincluding a certain cell in the first column intersect with each other,a degree of co-occurrence of two keywords respectively set in these twocells is set. It should be noted that in FIG. 5, cells in each of whicha degree of co-occurrence not less than “20” is set is shown by a boldframe. This is because an assumption is made that an edge is placedbetween the nodes having a degree of co-occurrence not less than “20.”

Incidentally, as is clear from the frequency information shown in FIG.4, the illustrated co-occurrence matrix is a co-occurrence matrix inwhich degrees of co-occurrence between the keywords of frequency level 1are set (level 1-level 1 co-occurrence matrix). However, for generatinga spring graph for each hierarchy level, as described above, aco-occurrence matrix for each frequency level is preferably generated.Although it is not illustrated, a co-occurrence matrix in which degreesof co-occurrence between a keyword of the frequency level 1 and akeyword of the frequency level 2 are set (level 1-level 2 co-occurrencematrix), or degrees of co-occurrence between a keyword of the frequencylevel 2 and a keyword of the frequency level 2 are set (level 2-level 2co-occurrence matrix) may be generated.

Furthermore, a description will be given of a spring graph based on thenode coordinate information and the edge coordinate informationgenerated in step 304.

FIG. 6 is a diagram showing a specific example of the spring graph. Thisspring graph is a spring graph representing relevancy between keywordsof the frequency level 1, and is generated from the level 1-level 1co-occurrence matrix shown in FIG. 5. In the spring graph, the nodecoordinate information is set, so that the higher the degree ofco-occurrence, the closer the distance between the nodes. For example, arule that an inverse number of the degree of co-occurrence of nodes isused as the distance between the nodes may be employed.

In this example, an assumption is made that a user initially inputs“Ruby,” “Java” and “C++” as the search words and performs a search. As aresult of the search, a search result as usual is displayed, but in thisembodiment, a keyword having a high degree of co-occurrence with each of“Ruby,” “Java” and “C++” is additionally extracted as a related word.Specifically, related words indicating a script language such as “Perl”and “Python” for “Ruby,” a related word indicating a web system languagesuch as “html” for “Java,” and related words indicating the C languagesuch as “C” and “VC++” for “C++” are extracted. Then, the relevancybetween these keywords is visualized.

Furthermore, the cluster definition information generated in step 305will be described.

FIG. 7 is a diagram showing a specific example of the cluster definitioninformation.

As illustrated, the cluster definition information is informationassociating keywords with clusters to which the keywords belong.

Incidentally, as is clear from the frequency information shown in FIG.4, the illustrated cluster definition information is one that associatesthe keywords of the frequency level 1 with the clusters in the springgraph of the hierarchy level 1. However, in a case where a spring graphfor each hierarchy level is to be generated, cluster definitioninformation as to a deeper hierarchy level is preferably generated. Forexample, suppose that a cluster 3 is classified into clusters 3-1, 3-2and 3-3 in a spring graph of the hierarchy level 2. In this case,although it is not illustrated, cluster definition informationassociating the keywords of the frequency levels 1 and 2 that belong tothe cluster 3 with the clusters 3-1, 3-2 and 3-3 may be generated.

Furthermore, a description will be given of a spring graph obtained byadding graphics based on the cluster coordinate information generated instep 306 on the spring graph of FIG. 6.

FIG. 8 is a diagram showing a specific example of such a spring graph.As illustrated, in this spring graph, clusters each of which group hassimilar keywords into a single group are formed.

In this example, a graphic that surrounds the entire graph indicatesthat this spring graph is a spring graph of the hierarchy level 1 (inFIG. 8, simply described with “level 1”). As is clear from the clusterdefinition information shown in FIG. 7, “Perl,” “Ruby” and “Python”belong to the cluster 1, “html” and “Java” belong to the cluster 2, and“C,” “C++” and “VC++” belong to the cluster 3. Accordingly, a graphicindicating the cluster 1 is drawn so as to surround the nodesrepresenting “Perl,” “Ruby” and “Python,” and a graphic indicating thecluster 2 is drawn so as to surround the nodes representing “html” and“Java.” Moreover, a graphic indicating the cluster 3 is drawn so as tosurround the nodes representing “C,” “C++” and “VC++.” Here, once theclusters are determined in the aforementioned manner, coordinateinformation that becomes a source of a spring graph of a lower hierarchylevel may be stored for each of the clusters.

Next, a description will be given of the operation for transmitting asearch query to the search server 20 upon receipt of operationinformation on a spring graph from the client 10.

To begin with, operations that can be performed by a user on the springgraph will be listed below.

Firstly there is an operation to change the keyword group that the userinputs to the search server 20. This operation further includes anoperation to select a cluster and an operation to change a hierarchylevel. The operation to select a cluster is performed for determining asearch target category. Then, the operation to change a hierarchy levelis an operation to be performed when a user desires to find a moredetailed keyword (such as a keyword in a specialized field).

Secondly, there is another operation that is to change the spring graphitself. This operation includes operations to move a node betweenclusters, to merge a node with another node, and to delete a node. Theoperation to move a node between clusters is performed for changing anAND condition or a NOT condition in a search query. The operation tomerge a node with another node is performed for coupling keywordscorresponding to the nodes under an OR condition. The operation todelete a node is performed for ignoring a keyword corresponding to thenode. Moreover, the operation to change a spring graph itself may be anoperation to change the weighting of feature values. Since thisoperation has an influence on a degree of co-occurrence betweenkeywords, the operation is to change a cluster. In addition, anoperation to add or delete an edge may be performed as the operation tochange the spring graph itself.

FIG. 9 is a flowchart showing an operation example of the search supportserver 30 in accordance with the aforementioned user operations.

When operation information is transmitted from the client 10, thereceiver 31 receives the operation information and then passes theoperation information to the operation determination unit 42 in thesearch support server 30 (step 321).

Then, the operation determination unit 42 determines, on the basis ofthe operation information received by the receiver 31, the content of aninstruction issued by the user (step 322).

Suppose that, as a result of the determination, it is determined thatthe content of the instruction is the instruction of a hierarchy levelchange. This determination can be made by confirming that theinformation received from the receiver 31 includes identificationinformation of the selected cluster and information for changing ahierarchy level. In this case, the operation determination unit 42notifies the display information generation unit 39 that the instructionof a hierarchy level change has been issued. Then, the displayinformation generation unit 39 generates display information fordisplaying the spring graph by use of coordinate information thatcorresponds to a hierarchy level that is at one level lower than thehierarchy level of the spring graph including the selected cluster andthat corresponds to the selected cluster (step 323). The coordinateinformation here is selected from the coordinate information stored inthe coordinate information storage unit 36. The display informationgeneration unit 39 then transmits the generated display information tothe transmitter 41 (also step 323).

Then, the transmitter 41 transmits the display information to the client10 (step 324). The spring graph is thereby displayed on an unillustrateddisplay of the client 10.

Furthermore, suppose that the content of the instruction is aninstruction of a node change. This determined can be made by confirmingthat the information passed from the receiver 31 includes identificationinformation of the selected node. In this case, the operationdetermination unit 42 passes the information passed from the receiver 31to the cluster re-generation unit 43, and then, the clusterre-generation unit 43 determines the content of the node change (step325).

Here, suppose that the information passed from the operationdetermination unit 42 includes identification information of a node andcoordinate information of the node. This is a case where an operation tochange a certain node is performed, and the identification informationof the node of the moving target and the coordinate information of themoving destination are passed.

In this case, the cluster re-generation unit 43 refers to the coordinateinformation storage unit 36 and then specifies a cluster including thecoordinate information of the moving destination. In the meantime, thecluster re-generation unit 43 refers to the cluster definitioninformation storage unit 38 and then specifies a cluster originallyincluding the node of the moving target. Then, the cluster re-generationunit 43 compares these clusters. If these clusters are different, as aresult of the comparison, the cluster re-generation unit 43 determinesthat the content of the node change is the moving of the node betweenthe clusters. Thereafter, the cluster re-generation unit 43 rewrites theidentification information of the cluster associated with a keywordcorresponding to the node of the moving target with the identificationinformation of the cluster of the moving destination, in the clusterdefinition information stored in the cluster definition informationstorage unit 38 (step 326).

In addition, the cluster re-generation unit 43 refers to the coordinateinformation storage unit 36 and then determines whether or notcoordinate information of a different node exists within a predetermineddistance from the coordinate information of the moving destination. Whenit is determined as a result that coordinate information of a differentnode exists, the cluster re-generation unit 43 determines that thecontent of the node change is a merger of the node of the moving targetwith the different node (the node of the moving destination). Then, thecluster re-generation unit 43 updates the cluster definition informationstored in the cluster definition information storage unit 38 (step 327).Specifically, the cluster re-generation unit 43 merges thecorrespondence between the cluster and a keyword corresponding to thenode of the moving target with the correspondence between the clusterand a keyword corresponding to the node of the moving destination. Thecluster re-generation unit 43 as a result changes the correspondences toa correspondence between the cluster of the moving destination and akeyword group including the keyword corresponding to the node of themoving target and the keyword corresponding to the node of the movingdestination. Then, the cluster re-generation unit 43 sets informationobtained by averaging the information on the keyword group including thekeyword corresponding to the node of the moving target and the keywordcorresponding to the node of the moving destination in the frequencyinformation stored in the frequency information storage unit 33 (step328). Specifically, the correspondence between the keyword correspondingto the node of the moving target and the appearance frequency and thefrequency level is merged with the correspondence between the keywordcorresponding to the node of the moving destination and the appearancefrequency and the frequency level. These correspondences are as a resultchanged to a correspondence between the keyword group including thekeyword of the node of the moving target and the keyword of the node ofthe moving destination and the average value of the appearance frequencyof these keywords and the frequency level corresponding to the averagevalue thereof.

It should be noted that for the sake of simplification, an assumption ismade herein that the moving of a node between clusters and a merger ofnodes do not occur at the same time. However, it is possible todetermine that such events occur at the same time and then to performthe processing in steps 326 to 328 simultaneously.

In addition, suppose that the information passed by the operationdetermination unit 42 includes identification information of a node anda null value as the coordinate information of the node. This is a casewhere an operation to delete a certain node is performed, and alsoidentification information of the node of the deletion target is passedfrom the operation determination unit 42.

In this case, the cluster re-generation unit 43 deletes informationrelated to a keyword corresponding to the node of the deletion target inthe cluster definition information stored in the cluster definitioninformation storage unit 38 (step 329). Then, the cluster re-generationunit 43 deletes information related to the keyword corresponding to thenode of the deletion target in the frequency information stored in thefrequency information storage unit 33 (step 330).

Furthermore, suppose that the content of the instruction determined instep 322 is a search query generation instruction. This determinationcan be made by confirming that the information passed from the receiver31 includes identification information of the selected cluster andinformation to generate a search query. In this case, the operationdetermination unit 42 notifies the search query generation unit 44 ofthe content of the instruction. Then, the search query generation unit44 generates a search query and passes the search query to thetransmitter 41 (step 331). At this time, firstly, a keyword equivalentto an upper level node among keywords that belong to the selectedcluster is included as an AND condition. Secondly, a conditionconnecting to, by “OR,” a keyword equivalent to a lower node among thekeywords that belong to the selected cluster is included as an ANDcondition. Thirdly, a keyword having a lowest relevancy with a keywordincluded in the selected cluster among keywords in a different clusterat the same hierarchy level as that of the selected cluster is includedas a NOT condition.

Then, the transmitter 41 transmits a search query to the search server20 (step 332). The search server 20 thereby performs a search on theInternet on the basis of the search query and replies to the searchsupport server 30 with a web page of the search result. Then, the searchsupport server 30 transmits this search result to the client 10. Thesearch result is thereby displayed on an unillustrated display of theclient 10. Alternatively, the search server 20 may directly transmit thesearch result to the client 10.

Next, the aforementioned operation will be described by use of aspecific example. It is to be noted that hereinafter, in the descriptionbelow, “Java” is a trademark or a registered trade mark of SunMicrosystems, Inc., in the United States or other countries, and“Windows” is a registered trade mark of Microsoft Corporation in theUnited States or other countries.

Firstly, a description will be given of a specific example of theoperation in a case where the content of the instruction in step 322 isdetermined to be a hierarchy level change instruction.

FIGS. 10A and 10B are diagrams showing a specific example of a change ina spring graph in this case.

As shown in FIG. 1A, the spring graph of the hierarchy level 1 includesclusters 1, 2 and 3. Suppose that when a user sees this spring graph,the user wishes to acquire further a specialized knowledge as to thecluster 3. In this case, as illustrated, the user is allowed to selectthe cluster 3 by clicking a region of the cluster 3. Here, a state inwhich the cluster 3 is selected is shown with the bold broken lineindicating the region of the cluster 3 in FIG. 1A. Upon selection of thecluster 3, the user can then view the spring graph of the hierarchylevel 2, which focuses on the keywords included in the cluster 3, asshown in FIG. 10B. Then, in this spring graph as well, clusters 3-1, 3-2and 3-3, which are lower level clusters than the cluster 3, aregenerated as in the case of hierarchy level 1.

It should be noted that in this spring graph of the hierarchy level 2,the upper level nodes are displayed by use of larger graphics than thoseof the lower level nodes. Such a display configuration is made possiblesince the co-occurrence matrix is generated for each frequency level ofkeywords, and the frequency level of the keywords represented by thenode can be provided to the node coordinate information stored in thecoordinate information storage unit 36.

Here, a specific example of a search query generated when a cluster isselected on a spring graph is shown.

For example, suppose that a user wishes to perform a search for thecluster of “C,”, “main,” “scanf” and “printf.” In this case the userselects the cluster 3-2. A state in which the cluster 3-2 is selected isshown with the bold broken line indicating the region of the cluster 3-2in FIG. 10B. Here, the keyword corresponding to the upper level node inthe cluster 3-2 is “C.” Accordingly, the search query generation unit 44first includes “C” in the search query as an AND condition. In addition,the keywords corresponding to the next upper level node are “main,”“scanf” and “printf.” The search query generation unit 44 thus includesa condition where “main,” “scanf” and “printf” are connected to eachother by “OR” as an AND condition in the search query. Moreover, otherclusters of the same hierarchy level as that of the cluster 3-2 are theclusters 3-1 and 3-3. The keywords belong to these clusters are “C++,”“template,” “class,” “VC++,” “windows,” and “.net.” Accordingly, thesearch query generation unit 44 lastly includes a condition where “C++,”“template,” “class,” “VC++,” “windows,” and “.net,” are connected toeach other by OR in the search query as a NOT condition. Specifically,the search query herein is as follows:(C)&(scanf|main|printf)&!(C++|class|template|VC++|windows|. net) Here,it is to be noted that “and” is expressed by “&” and “not” is expressedby “!.”

Moreover, a specific example of an operation in a case where the contentof the instruction is determined to be a node change instruction in step322. In this case, the user is allowed to move a node between clusters,to merge a node with another node and to delete a node. However, amongthese operations, the operation when the user moves a node betweenclusters will be exemplified herein.

FIG. 11 is a diagram showing a specific example of a change in a springgraph in this case.

In this example, as illustrated, the node indicating “Perl” originallydisplayed within the graphic representing the cluster 1 is moved intothe graphic representing the cluster 2. Accordingly, in step 326 of FIG.9, the search support server 30 updates the cluster definitioninformation managed by the cluster definition information storage unit38. Here, since the node indicating “Perl” is moved to the cluster 2, anedge that does not exist originally is placed between the nodeindicating “Perl” and the node indicating “Java.” This is because thedistance between the node indicating “Perl” and the node indicating“Java” become less than that in the case where the degree ofco-occurrence is “20.” However, such a new edge does not have to bedrawn.

FIG. 12 is a diagram showing a specific example of the clusterdefinition information after the information is updated in step 326.

As shown in FIG. 7, the cluster corresponding to the keyword, “Perl,” isthe cluster 1 before the node indicating “Perl” is moved into thegraphic representing the cluster 2. However, after the operation shownin FIG. 11 is performed, the cluster that corresponds to the keyword,“Perl,” becomes the cluster 2 as shown in the bold frame in FIG. 12.

In addition, in this example, since “VC++” is an abbreviation of “VisualC++,” a possibility that “Visual C++” appears as a node is extremelyhigh. The user merges “VC++” with “Visual C++,” and the system connectsthese two words by OR and then performs a search. However, it is a verytroublesome work for the user to specify such a query as the keyword,and it is not realistic. However, such a troublesome work can be reducedand the search condition can be optimized by providing a system thatautomatically generates a search condition via a spring graph.

The present embodiment has been described above.

Incidentally, in this embodiment, the graph generation unit 35 isconfigured to generate display information for displaying a spring graphon the basis of only coordinate information. This is becauseco-occurrence information that is the base of the coordinate informationincludes a frequency level of each keyword, and thus the frequency levelinformation is provided in the coordinate information in other words.However, in a case where a frequency level is not included in theco-occurrence information and is not included in the coordinateinformation either, the graph generation unit 35 may refer to afrequency level stored in the frequency information storage unit 33 whengenerating the spring graph.

Moreover, a search query generated by the search query generation unit44 in this embodiment is only an example. The search query may be onethat includes a keyword represented by an object belonging to theselected cluster as any one of an AND condition and an OR condition, andalso that includes a keyword represented by an object that belongs to acluster different from the selected cluster as a NOT condition.Furthermore, at this time, whether a keyword is to be included as an ANDcondition or an OR condition may be determined in advance in accordancewith the appearance frequency of the keyword.

As described above, in this embodiment, when a search is performed onthe basis of a search word, a spring graph generated on the basis of akeyword group including the search word and a related word extractedfrom the search result is provided in addition to a list of web pages ofthe search result. Then, the user is allowed to change the relevancybetween keywords by operating the spring graph and to automaticallygenerate a search condition by selecting a predetermined range of thespring graph. Specifically, a search that conforms to the request of auser is made possible without causing the user to be aware of acomplicated search condition.

Specifically, the following can be enabled according to each of thecomponents.

Firstly, a spring graph generated on the basis of a keyword groupextracted from the search result is provided in addition to a list ofweb pages of the search result. The user is thereby allowed to knowvalidity of the search word used for the search and relevancy with arelated word to be used as a search word in the next search.

Moreover, in this embodiment, a relationship between keywords isvisualized by use of a spring graph, and a user is allowed to operatethe spring graph. Thereby, in addition to an AND condition, which isfrequently used by a user, an OR condition (words having the samemeaning) and a NOT condition (word not related to the field) can beautomatically generated.

Furthermore, in this embodiment, a spring graph is configured to includea hierarchy structure and an upper level and lower level concept isprovided to keywords. The user is thereby allowed to adjust the depth ofthe content of a web page that the user wishes to search.

In addition, when a large number of search words is specified as an ANDcondition, there occurs a problem that a web page that is actuallyuseful to the user is not included in a search result since one of thesearch words is not included in the web page. In this respect, thenumber of search words used for AND conditions is to be suppressed bygenerating a large number of NOT conditions by using the characteristicsof a spring graph. Accordingly, only web pages that are the interests ofthe user can be searched for. It should be noted that there may be acase where a web page that is useful to the user is not included in thesearch result since the NOT condition is specified. However, a keywordhaving a degree of co-occurrence with the keyword of the search targetequal to or greater than a constant value can be removed even as a NOTcondition.

This embodiment is applicable in the following cases.

Firstly, there is a case where the validity of a search word is unknown.In this case, the user can know a different search word that is usefulfor the search from the position of the original search word in thespring graph.

Secondly, there is a case where many search words are general words, sothat a large amount of search results exists, but multiple search wordsare needed. According to the embodiment, the user can perform a searchsimply with a large number of search words without paying attention tothe search conditions such as AND, OR and NOT. As a result, the user caneasily narrow down the search target. Although a large number of ANDconditions is needed in this case, the user does not have to find searchwords by himself or herself.

Lastly, a description will be given of a hardware configuration of asuitable computer to which the embodiment of the present invention isapplied. FIG. 13 is a diagram showing an example of such a hardwareconfiguration of the computer. As illustrated, the computer includes acentral processing unit (CPU) 90 a, which is arithmetic means, a mainmemory 90 c connected to the CPU 90 a via a mother board (M/B) chipset90 b, and a display system 90 d connected to the CPU 90 a via the M/Bchipset 90 b likewise. Furthermore, a network interface 90 f, a magneticdisk device (HDD) 90 g, a sound system 90 h, a keyboard/mouse 90 i, anda flexible disk drive 90 j are connected to the M/B chipset 90 b via abridge circuit 90 e.

Here, each constituent element is connected via a bus in FIG. 13. Forexample, the CPU 90 a and the M/B chipset 90 b, and the M/B chipset 90 band the main memory 90 c are connected to each other via a CPU bus.Furthermore, the M/B chipset 90 b and the display system 90 d may beconnected via an accelerated graphics port (AGP), but when the displaysystem 90 d includes a PCI Express video card, the M/B chipset 90 b andthis video card are connected via a PCI Express (PCIe) bus. Moreover, aPCI Express, for example, can be used to connect the network interface90 f to the bridge circuit 90 e. In addition, serial AT attachment(ATA), parallel ATA or peripheral components interconnect (PCI), forexample, can be used to connect the magnetic disk device 90 g to thebridge circuit 90 e. In addition, universal serial bus (UBS) can be usedto connect the keyboard/mouse 90 i and flexible disk drive 90 j to thebridge circuit 90 e.

The present invention may be implemented by hardware alone or softwarealone. Furthermore, the present invention may be implemented by bothhardware and software. In addition, the present invention can beimplemented as a computer, a data processing system, and a computerprogram. This computer program can be stored in a computer-readablemedium to be offered. Here, applicable media include an electronic,magnetic, optical, electromagnetic, infrared or semiconductor system(device or apparatus), or a propagated medium. Moreover, exemplifiablemedia which are readable by a computer include a semiconductor, asolid-state storage device, a magnetic tape, a removable computerdiskette, a random access memory (RAM), a read only memory (ROM), arigid magnetic disk, and an optical disk. As an example of the opticaldisk at the present time, a compact disc-read only memory (CD-ROM), acompact disc-read/write (CD-R/W), and a DVD are included.

According to the present invention, a user is allowed to specify akeyword appropriate for a search through a simple operation and toperform the search in accordance with a user request.

As described above, the description has been given of the embodiment ofthe present invention; however, the technical scope of the presentinvention is not limited to the aforementioned embodiment. It is obviousto those skilled in the art that various modifications can be made andthat alternative aspects can be adopted, without departing from thespirit and the scope of the present invention.

1. An apparatus for supporting a document data search based on akeyword, comprising: an extraction unit for extracting a plurality ofkeywords from search target document data; a graph generation unit forgenerating a graph including a plurality of objects respectivelyrepresenting the plurality of keywords extracted by the extraction unitand being classified into a plurality of clusters; and a searchcondition statement generation unit for generating a search conditionstatement by use of a keyword, in accordance with a user operation toselect a specific cluster among the plurality of clusters in the graphgenerated by the graph generation unit, the keyword being represented byan object that belongs to the specific cluster.
 2. The apparatusaccording to claim 1, wherein the graph generation unit generates thegraph in which the plurality of objects are arranged respectively atsuch positions that a distance between two objects of the plurality ofobjects corresponds to a degree of co-occurrence of two keywordsrepresented by the two objects, and in which the plurality of objectsare classified into the plurality of clusters on the basis ofinformation on the positions.
 3. The apparatus according to claim 1,wherein among keywords that appear in the search target document data,the extraction unit extracts keywords that appear at a specific level offrequency, as the plurality of keywords corresponding to the specificlevel, and in accordance with a user operation to specify the specificlevel, the graph generation unit generates the graph including aplurality of objects respectively representing the plurality of keywordscorresponding to the specific level and being classified into aplurality of clusters.
 4. The apparatus according to claim 1, wherein,in accordance with a user operation on a specific object included in thegraph, the graph generation unit adds a change related to the specificobject to the graph.
 5. The apparatus according to claim 4, wherein thechange related to the specific object is a change to cause the specificobject belonging to a first cluster to belong to a second cluster. 6.The apparatus according to claim 4, wherein the change related to thespecific object is to merge the specific object and an object differentfrom the specific object and thereby to obtain a single objectrepresenting a keyword represented by the specific object and a keywordrepresented by the different object.
 7. The apparatus according to claim4, wherein the change related to the specific object is a change todelete the specific object.
 8. The apparatus according to claim 1,wherein the search condition statement generation unit generates thesearch condition statement including the keyword represented by theobject belonging to the specific cluster as any one condition of an ANDcondition and an OR condition, which is determined previously inaccordance with appearance frequency of the keyword.
 9. The apparatusaccording to claim 8, wherein the search condition statement generationunit generates the search condition statement including, as a NOTcondition, a keyword represented by an object that belongs to a clusterother than the specific cluster.
 10. An apparatus for supporting adocument data search based on a keyword, comprising: an extraction unitfor extracting a plurality of keywords from search target document data;a graph generation unit for generating a graph including a plurality ofobjects respectively representing the plurality of keywords extracted bythe extraction unit, the plurality of objects being arrangedrespectively at such positions that a distance between two objects ofthe plurality of objects corresponds to a degree of co-occurrence of twokeywords represented by the two objects, and the plurality of objectsbeing classified into a plurality of clusters on the basis ofinformation on the positions; and a search condition statementgeneration unit for generating a search condition statement inaccordance with a user operation to select a specific cluster among theplurality of clusters in the graph generated by the graph generationunit, the search condition statement including a keyword represented byan object that belongs to the specific cluster, as any one condition ofan AND condition and an OR condition, and also a keyword represented byan object that belongs to a cluster other than the specific cluster, asa NOT condition.
 11. A method of supporting a document data search basedon a keyword, comprising the steps of: extracting a plurality ofkeywords from search target document data; generating a graph includingplurality of objects respectively representing the extracted pluralityof keywords and being classified into a plurality of clusters; and inaccordance with a user operation to select a specific cluster among theplurality of clusters in the generated graph, generating a searchcondition statement by use of a keyword represented by an object thatbelongs to the specific cluster.
 12. A method of supporting a documentdata search based on a keyword, comprising the steps of: extracting aplurality of keywords from search target document data; generating agraph including a plurality of objects respectively representing theextracted plurality of keywords and being classified into a plurality ofclusters; in accordance with a user operation for a specific objectincluded in the generated graph, adding a change related to the specificobject to the graph; and in accordance with a user operation to select aspecific cluster among the plurality of clusters in the graph after thechange is added to the graph, generating a search condition statement byuse of a keyword represented by an object that belongs to the specificcluster.
 13. A program causing a computer to function as an apparatusfor supporting a document data search based on a keyword, the programcausing the computer to function as: an extraction unit for extracting aplurality of keywords from a search target document data; a graphgeneration unit for generating a graph including a plurality of objectsrespectively representing the plurality of keywords extracted by theextraction unit and being classified into a plurality of clusters; and asearch condition statement generation unit for generating, in accordancewith a user operation to select a specific cluster among the pluralityof clusters in the graph generated by the graph generation unit, asearch condition statement by use of a keyword represented by an objectthat belongs to the specific cluster.